Accounting for Imputation When Estimating Variances in the Economic Surveys at the Census Bureau
نویسنده
چکیده
In most surveys, we encounter missing data of one or more types. Some sample units leave some data items blank--item nonresponse. The usual approach is to impute all missing, inconsistent, or otherwise invalid items. This paper considers the effect on the estimates of variance from treating imputed values as if they were reported, and compares strategies to address it. Data processing for many of the economic surveys conducted by the Census Bureau has been moved onto a generalized system called the Standard Economic Processing System (StEPS). Methods for estimating variances available in many systems, including StEPS, treat all processed data as if they were reported, ignoring the fact that the imputed values were not observed. The result is that this “naïve” estimator of variance typically is biased; often it underestimates the true variance. Under StEPS, a survey is not restricted to one method of imputation. Some surveys apply a primary method, and, if the variable(s) required is not available, they revert to a second or even a third method. We call this multi-phased procedure mixed imputation. We have not seen it addressed often in the literature beyond Shao and Steel (1999) and Full (2000). Our goal is to develop the capability in StEPS to obtain an approximate estimate of variance that takes into account the component due to the imputation of missing or invalid values. However, in producing this variance estimate, several considerations are to be balanced: (1) the accuracy of the resulting variance estimates, (2) the ability to generalize the procedure to various types of imputation, (3) the procedure's robustness to the use of mixed imputation, and (4) the ease of implementing the procedure within StEPS. Weighing these constraints, we considered several procedures discussed in the literature, and concentrate on two of them. The first is a simple procedure that inflates the naïve variance estimate by a factor that depends on the amount and type of imputation. Under the second procedure (Kim 2001), we create “pseudo-data,” a second set of responses perturbed enough that commonly applied variance estimation formulae or software will pick up the variability caused by the imputation. Section 2 contains a brief review of the literature and the methods. Some options for imputation under StEPS are described in Section 3. In Sections 4 and 5, respectively, we explore the inflation-factor approach and Kim's method. Finally, in Section 6 we provide results of a simulation study. Many derivations, citations, and other details are left out of this paper, but can be obtained from the author in a separate technical report.
منابع مشابه
BUREAU OF THE CENSUS STATISTICAL RESEARCH DIVISION REPORT SERIES SRD Research Report Number: CENSUS/SRD/RR-84/06 AN IMPROVED PROCEDURE FOR ESTIMATING THE COMPONENTS OF RESPONSE VARIANCE IN COMPLEX SURVEYS
Fellegi's (1974) improved method for estimating the interviewer component of correlated response variance is extended to a. groups of k interviewer assignments for-general multistage survey designs. Using a linear models approach suggestive of Hartley and Rao (1978), the independence of the two estimators of interviewer variance is established and the forms of the variances of the estimators ar...
متن کاملEDITING AND IMPUTATION FOR ECONOMIC SURVEY DATA bY Roderick
This series contains research reports, written by or in cooperation with staff members of the Statistical Research Division, whose content may be of interest to the general statistical research community. The views reflected in these reports are not necessarily those of the Census Bureau nor do they necessarily represent Census Bureau statistical policy or practice. ABSTRACT At the U.S. Bureau ...
متن کاملHow can the American Community Survey (ACS) be used to improve the imputation of Owner-Occupied Rent Expenditures?
There are currently two major agencies, the Bureau of Labor Statistics (BLS) and the Bureau of Economic Analysis (BEA), that produce estimates of the cost of shelter for renters and for owners on a regular basis. In addition, the Census Bureau is conducting a nation-wide survey, the American Community Survey (ACS) of rents and owner costs on a rolling five-year basis. This paper explores the fe...
متن کاملAn Empirical Comparison of Performance of the Unified Approach to Linearization of Variance Estimation after Imputation with Some Other Methods
Imputation is one of the most common methods to reduce item non_response effects. Imputation results in a complete data set, and then it is possible to use naϊve estimators. After using most of common imputation methods, mean and total (imputation estimators) are still unbiased. However their variances (imputation variances) are underestimated by naϊve variance estimators. Sampling mechanism an...
متن کاملThe Dynamics of Plant-level Productivity in U.S. Manufacturing
Recent work in I.O. has emphasized the importance of firmand plant-level heterogeneity in total factor productivity. In this paper, we estimate establishmentlevel productivity for the entire U.S. manufacturing sector from 1976 until 1999 by using the Census Bureau’s Longitudinal Research Database combined with the Bureau’s Longitudinal Business Database. We characterize the time series properti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002